Skip to content

[5.7] [DNM] Cherry-pick batch test PR #443

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

hamishknight
Copy link
Contributor

@hamishknight hamishknight commented May 27, 2022

Azoy and others added 25 commits July 21, 2022 20:13
Move options from RegexComponent to Regex
Remove the DSL -> _CharacterClassModel conversion,
and _CharacterClassModel's custom character class
matching logic, none of which is being used.
`makeDSLTreeCharacterClass` was the last API
that required it to be public. Remove it, and
replace it with some static members on `_AST.Atom`.
Map to `.newlineSequence` instead of `.newline`,
which allows it to create the correct consumer.

rdar://96330096
Explicitly disambiguate the fact we're talking
about `.`, which does not match newlines unless in
single line mode.
This time as a "true any" that matches any
character, including newlines.
This should map to `.any`, not `.dot`.

rdar://96509234
This enum will start including cases that only the
DSL can use, so move it off the AST.
Introduce `startOfInput` and `endOfInput` assertion
kinds, and map the DSL to them such that they do
not depend on matching options.

rdar://97029630
…ftlang#560)

This fixes infinite loops when we loop over an internal node that does not have any forward progress. Also included is an optimization to only emit the check/break instructions if we have a case that might result in an infinite loop (possibly non-progressing inner node + unlimited quantification)
)

- Adds new instructions for matching characters and scalars case insensitively
- Compiles ascii character matches into the faster scalar match instructions even in grapheme semantic mode
- Optimizes out unnecessary runtime grapheme boundary checks for all ascii strings
- Also includes fixes to scalar matching in grapheme semantic mode (swiftlang#565)
This allows us to catch the case where a match
occurs without optimizations, but doesn't occur
with optimizations. Additionally fix the `xfail`
param such that it can't be used on tests that
actually match expectations.
Replace a couple of `#if os(Linux)` checks with
a check to see if we have a newer stdlib
available. This lets us emit an expected failure
in the case where we're testing on an older
stdlib.
Previously we performed a lexicographic
comparison with the bounds of a character class
range. However this produced surprising results,
and our implementation didn't properly handle
case sensitivity.

Update the logic to instead only allow single
scalar NFC bounds. The input is then converted to
NFC in grapheme semantic mode, and checked against
the range. In scalar semantic mode, the input
scalar is checked on its own. Additionally, fix
the case sensitivity handling such that we check
both the lowercase and uppercase version of the
input against the range.
Previously we would emit a series of scalars
written in the DSL as a series of individual
characters in grapheme semantic mode. Change the
behavior such that we coalesce any adjacent
scalars and characters, including those in regex
literals and nested concatenations. We then
perform grapheme breaking over the result, and can
emit character matches for scalars that coalesced
into a grapheme.

This transform subsumes a similar transform we
performed for regex literals when converting them
to a DSLTree. This has the nice side effect of
allowing us to better preserve scalar syntax in
the DSL transform.

rdar://96942688
Previously we would only match entire characters.
Update to use the generic Character consumer logic
that can handle scalar semantic mode.

rdar://97209131
In grapheme semantic mode, coalesce adjacent
character and scalar members of a custom character
class, over which we can perform grapheme breaking.
This involves potentially re-writing ranges such
that they contain a complete grapheme of adjacent
scalars.
Make sure we throw the right error for ranges
that are invalid in grapheme mode, but are valid
in scalar mode.
I also noticed that `lexQuantifier` could silently
eat trivia if it failed to lex a quantification,
so also fix that.
@hamishknight
Copy link
Contributor Author

@swift-ci please test

@stephentyrone
Copy link
Contributor

Hamish, we can close this one out now, right?

@hamishknight
Copy link
Contributor Author

Yeah

@hamishknight hamishknight deleted the 5.7-test-queue branch July 22, 2022 17:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants